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Consider the problem of operating on a sequence of i.i.d. Bernoulli variables with unknown meanp 
to produce a sequence of symmetric Bernoulli variables. Define the efficiency of any proposed method 
to be the average number of binary output digits per input digit. The following results are proved: (A) No 
method exists having efficiency greater than — p\o$>p — q\ogzq, where q—l — p. (B) Methods do exist 
with efficiency arbitrarily close to the bound just given. Examples are given, and compared with other 
methods in the literature. A technique for finding the methods of (B) above is given. 

Key words: Bernoulli; binomial: coin — tossing; computer; efficiency: generator; random numbers; 
statistics. 

1 . Introduction 

The problem of deriving unbiased Bernoulli variables from a sequence of independent and 
identically distributed Bernoulli variables with mean p has received attention repeatedly in the 
literature. The most recent appearance known to this author is a paper by Simons and Hoeffding [4], 1 
wherein are given references to other work. The approach adopted by Simons and Hoeffding con- 
centrates on the sample sequences, and investigates stopping rules and decision rules which specify 
when a sample sequence is terminated with the issuance of a (binary) output digit, and which digit 
is output. One rule is "better" than another, in their terminology, if it stops as soon as the other for 
any sample sequence, and sooner for at least one sequence. They define the class of "even" 
procedures, and find the best even procedure, which they denote by Q>- They then find a better 
.(noneven) procedure Q.i, a lower bound for the expected length of the sample sequence, a proof that 
there is no procedure which is as good as or better than all others, and a procedure Q* which is 
better than Q% for small p values. Incidentally, they start by introducing the procedure of von Neu- 
mann [5], denoted Q\, which consists of taking successive pairs of digits until a mixed pair (i.e., a 
pair containing both a and a 1) is obtained, and outputting the second digit of the pair. 

In this paper, the efficiency of a procedure is defined in a long-run sense, as the average 
number of output digits per input digit. Thus a procedure which does not terminate as quickly as one 
of the Qi above, for certain sample sequences, and yet produces output digits at a higher average 
rate, is considered more efficient. For reference, note that the efficiency of Q\ is pq = 1/4 (where q = 

i-p). 

The efficiency of Qi is shown by Simons and Hoeffding to be 

(1/2) f[ (l-f /> 2> +(? 2 l) -l i 

1 = 1 

which is less than 

(1/2) (1 + /> 2 + 4 2 )- 1 (1 + p 4 + q 4 )~ l ^ 8/27 < 1/3. 
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As a matter of fact, the efficiency of Q3 is also less than 1/3. The lower bound for the expected size 
of the sample sequence for any procedure which looks at sample sequences in the way considered 
by these authors — i.e., which consists of a stopping rule and a rule for specifying an output digit 
whenever the sequence stops — is given by the authors as p~ x q~ x — 1. This immediately leads to an 
upper bound for the efficiency of such procedures, namely pq(\ — pq)' 1 . This function in turn takes 
on its maximum value at p = 1/2, for which it is 1/3. 

The procedure Qi defined by Simons and Hoeffding is equivalent to one presented by this 
author [2] at the 1964 Institute of Mathematical Statistics Meeting at Amherst, namely, stop as soon 
as a block from (k — l)2 n + 1 to k • 2 W , for any positive integers k and n, is evenly split between zeros 
and ones, and output the last digit; then start again. 

The procedure Q3 is an ingenious improvement over Q 2 . Simons and Hoeffding noted that there 
are circumstances when it is certain that the next digit (whatever it turns out to be) will result in ter- 
mination—and so there is no need to take that last digit, if one simply decides in the beginning to 
output the next-to-last digit (whether or not the last digit is needed). An example is in order at this 
point. The sequences 00001100 and 00000011 are equally likely, as are the sequences 00000010 and 
00000001. Under Q 2 , these sequences will lead to termination, with output 0,1,0,1, respectively. If we 
were to output the next-to-last digit at termination, the output would be 0,1,1,0, respectively. Now 
note that in all but the last sequence, we know before the eighth digit is taken that it will be the last 
digit. The procedure $3 merely takes advantage of this fact, by stopping as soon as it becomes 
known that termination will occur on the next digit. (Obviously, the last digit taken is the output of 
Q 3 whenever Q 3 stops sooner than Q 2 , while the next-to-last digit is the output when Q% stops at the 
same point as (?2.) 

In the next section, an intuitive derivation is given of a different kind of procedure, which grew 
out of a suggestion by Herman Rubin to the author during the Amherst meeting previously men- 
tioned. Following sections present a rigorous evaluation of the method, examples, and proofs. 2 

2. A Different Approach 

Consider again the von Neumann procedure above. It consists of taking pairs of observations, 
ignoring any pair consisting of two zeros or of two ones, and putting out one digit if a mixed pair oc- 
curs. Now two zeros or two ones can happen in only one way each, but a mixed pair can be either 01 
or 10— and one is as likely as the other. Thus another way of looking at the von Neumann procedure 
is in terms of rearrangements of the result that occurred: since there is only one arrangement possi- 
ble for the sequence 11, no output can be obtained, but a mixed sequence can happen in either of 
two equally likely ways which are assigned the (output) values and 1 (in arbitrary order). 

Now let us extend this technique. Suppose four input digits are examined together. If all are 
alike, then no rearrangements are possible — hence no output. If there is exactly one 0, or three 0's, 
then there are four equally likely rearrangements — 1110, 1101, 1011, 0111 in the former case, and 
the complements in the latter. We can assign output sequences 00, 01, 10, and 11 to these four possi- 
bilities, and thus obtain two output digits. But what if there are two l's and two 0's? Then there are 
(f) = six equally likely possibilities. We can take four of them, and derive two binary digits as 
above; but we can derive only one binary digit when one of the remaining two possibilities occurs. 
Thus we will get 0, 1, or 2 output digits for 4 input digits, with probabilities (p 4- ^ 4 ), 2p 2 g 2 , and 4p 3 g + 
4pqr 3 + 4p 2 g 2 respectively, for an average of 2pq — 3p 2 q 2 /2 output digits per input digit. This value is 
at least (13/8)pg, incidentally, so that this procedure is considerably more efficient than the von Neu- 
mann procedure, even for groups of four; it can be made still more efficient for larger groups, as will 
be seen. 

There is something basically unsatisfying about this procedure as just outlined. Consider again 
the simple example above with group size 4. There are six arrangements of two zeros and two ones, 



2 The author has recently learned of a related paper by Prof. Peter Elias, to be published in the Annals of Mathematical Statistics. 
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which is more than the four arrangements of one zero and three ones; yet the number of output 
dibits is (on the average) only 5/3, which is less than the two digits obtained for the case of one zero 
and three ones. The reason is obvious: 6 is not a power of 2. (For 7 arrangements, the expected 
number of digits is less: only 10/7.) One possible way to improve things is to save the set with six ar- 
rangements, and combine it with the next set having six arrangements. We thus have 36 possibilities 
(for the combined set): we can get 5 ( = log232) digits with probability 32/36, and 2 digits with proba- 
bility 4/36, for an average of 7/3 digits per set. We could go further, and find a higher power of 6 
which is closer to a power of 2; we could then get even more digits per set. 

This simple example embodies the key idea of the method. In the next section, a more complete 
discussion of the technique is presented. Two lemmas and a theoretical upper bound for efficiency 
are given, and used to show the existence of procedures with efficiencies arbitrarily close to the 
upper bound. This bound is then compared to the efficiencies of the Q { and of selected examples of 
the procedures, in the following section. 

3. Permutation Methods 

Consider sets of n independent and identically distributed Bernoulli variables with mean p. Any 
such set will have from zero to n l's. The probability that a set will have x l's is (")p x q n ~ x and a set 
with x l's can have any of ( J ) equally likely configurations. Let m = ( * ) . If m is a power of 2 , one can 
number all of the m configurations with (log^ra) — digit binary numbers, in such a way that each such 
number is used exactly once. Then when one of the m configurations occurs, one can simply output 
the (log2ra) digits in the number assigned to that configuration. If m is not a power of 2, a record may 
be kept of which configuration occurred (i.e., a number between 1 and m may be recorded). In future 
sets of n digits, every value of m will (with probability one) come up repeatedly, so that one obtains 
sequences of numbers between 1 and m (inclusive) — one sequence for each distinct value of m. In 
each sequence, each number is equally likely to take on any of the m values, of course. Withj(m) 
numbers, each equally likely to have any value between 1 and ra, one has mJ^ m) equally likely possi- 
bilities. It will be shown below that the integer j(m) can be chosen so that m) {m) is only slightly larger 
than a power of 2. Thus with high probability one obtains very nearly log^ra digits for each of the j(m) 
sets. 

A "permutation procedure" is a procedure which operates as just outlined. If n is the size of the 
set upon which the procedure is based, one needs to specify values j(m) for each m = ( x ), for x — 
0,1,2, ..., n. (Note that (*) = ( n * x )*) The; in Lemma 1 and Theorem 1 below is the j( m) just con- 
sidered, and the i below is the integer such that 2 j is almost as big as mJ {m) ; i.e., i = \j(m)lo^m\. Let 
Ku denote the mathematical expectation of the random variable u. Then we have: 

THEOREM 1: Given e > 0, a sequence of independent and identically distributed Bernoulli vari- 
ables, and an integer n > 1, There exists a permutation procedure based on sets o/n, with efficiency 
at least n _1 Elog2(j? ) — e. 

To prove this result, we will make use of the following lemma, proved in the Appendix: 

Lemma 1: Given a real number k, and an e > 0, there exist integers i and j such that ^ k — (i/j) 
< (€/j). 

PROOF OF Theorem 1: For each possible value of m=( ;), Lemma 1 implies the existence of in- 
tegers i andy such that ^ log^m — (i/j) < e/2/. A permutation procedure using these integers will 
have efficiency given by 

E n = n~i J D x (n)p* q n-*, 

X = l 

where D x denotes the expected number of output digits from a set of n digits having # l's. Now D x is 
at least (i/j) times the probability of getting i digits from a group of; such sets, where i = [\og>mJ]. 
This probability is 2 l lnV. From the inequality above, 

0^i-7(log 2 jn)>-€/2, 
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or equivalently, 

1 ^ V/mJ > 2^/ 2 > e~* 12 > 1 -(e/2); 

and 2&&o,(ilj) > log^m — e/2. Therefore 

E n >n-* J (l-e/2)(bgt(S)--€/2)(5)p*9-* 

X = 

= n~ l E log 2 (2) - (e/2n){E log 2 (3) + 1} + e 2 /4/i 

>«- 1 £log 2 (S)-e. 

(The last step is valid because (£)2~", being a probability, is at most unity, implying that log^ " ) = 
n.) Q.E.D. 

Lemma 2, also proved in the Appendix, asserts that 

lim n- 1 £ , log 2 (^) = — plog 2 p — qlog 2 q. 

This then is the limiting efficiency of permutation procedures. One can do no better, as evidenced 

by the following. 

THEOREM 2: Any procedure for generating symmetric Bernoulli variables from a stream of inde- 
pendent and identically distributed Bernoulli variables has efficiency at most — (p\og2p-\- qlo&q), 
where p is the mean value of the independent and identically distributed variables. 

PROOF: Kullback [1, p. 13] proves that for a random variable X, and a statistic F (a function of 
X), the average information in an outcome X is at least as great as that in the derived statistic Y. The 
"information" referred to in this theorem is the information for discrimination in favor of one 
hypothesis, //i, against another one, H 2 . Let X be an arbitrarily long (finite) sequence of independent 
and identically distributed Bernoulli variables, and Fthe derived sequence of symmetric Bernoulli 
variables. Let n be the length of the ^-sequence, and M (a random variable) the length of the Y- 
sequence; let m = E(M). Let Hi be the hypothesis that the mean of the elements of X is po; in terms 
of F, this hypothesis implies that the mean of the elements of Fis 1/2, regardless of po. Let Ho be the 
hypothesis that the mean of theX elements is in the closed unit interval. Then (in Kullback's nota- 
tion) 

7(1:2; X) = - J ^)p s q n ~ s {s log 2 p + (n-s) log 2 q ) 

= —n(p log2 Po + q log 2 go) 
^ — n(p log 2 p + q log 2 q). 



Similarly, 



/(1:2;F)=£{-]T (f)2- M {-t- (M-t)} = E(M) ■ 



By Kullback's theorem, m ^ - n(plo&p+ qhgnq). But the efficiency of the procedure under con- 
sideration is simply the limit of the ratio m/n as/i^oo; thus the efficiency is limited by - (plog 2 p + 
qlo&q). Q.E.D. 

Finally, we note that there is a simpler approach to the problem of attaining high efficiency. As 
can be seen from table 2, the limit of attainable efficiency for small n is not very high. Since one ap- 
proaches this limit by taking larger values of j, using more storage space and a more complicated 
rule for determining the output, it would seem worthwhile to consider direct methods using a larger 
value of n. In other words, if it is necessary to save (an average of) ten sets of n = S digits to come 
close to the limit £ , log 2 (J) derived above, we ought to compare with one set of n = 80. How does one 
use such a set? Suppose x is the number of l's that occur. Then m = ( s £). One numbers the m 
configurations arbitrarily from to m - 1, and sets off a group of size equal to the largest power of 
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2 less than (or equal to) m; then from what is left, a group of size equal to the largest power of 2 less 
than or equal to the number remaining; then with what is still left, repeat the procedure, until the en- 
tire group of m is exhausted. If the outcome should fall in a group of size 2\ one gets i digits. It is 
shown at the end of the Appendix that this procedure will attain an efficiency of at least 
n~ l (E([og2{^) — 2), which approaches the value n~ 1 E(log2( , j)) obtained above as n gets large. 

A few words are in order, at this point, about the practical implications of the techniques 
described above. Specifically, a method of obtaining the proper binary output stream from the input 
stream is needed. There are two steps to be accomplished: first, derive an index number for the 
input sequence, which identifies which of the equally likely configurations has occurred; and 
second, derive the appropriate output sequence from this index number. They will be considered in 
turn. 

Consider first the simple case of a set of n input digits, and let x be the number of l's among the 
n digits. As before, let m = ( t £). We will derive an index number between and m — \ inclusive 
which gives the position of the actual configuration in a sequence of the m possible configurations, 
arranged in order of increasing magnitude (considered as binary numbers). Let z denote the actual 
sequence (of x l's and (n — x) O's). Check the first bit. If it is 1, then z is not one of the (""*) com- 
binations starting with 0. Since these are precisely the first (""*) combinations, add the binomial 
coefficient ( n ~ 1 ) into a counter (which will eventually contain the index number corresponding to z). 
If the first bit is 0, then z is one of the first C 1 " 1 ) combinations, so don't add anything into the counter. 
Now check the next bit. If 1, add in the appropriate number — either ( w ~ 2 ) or (Jl*), depending 



8= n 




C = x 




COUNT = 







♦• STOP 



STOP 



Figure: 1. Flow chart for procedure assigning an index number between and m — 1 
inclusive to a sequence ZnZn-xZn-2 . . . Zi containing x ones and n — x zeros; m= (J). 
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on whether the first bit was a or a 1. Continue in this fashion through all the bits of 2, stopping as 
soon as the remaining bits of z are known to be all alike. This is really a fairly simple procedure to 
program: a flow chart is shown in figure 1. 

For a permutation procedure, one has several (say/) sets of n digits, each set being one of m 
configurations. One assigns a number between and m — 1, as above, to each set of n. Then one as- 
signs the number /ii + n± • m-\- m ' m 2 + • • . + /*/• rnJ- 1 to the combination of/ sets of n, where n\ is the 
number assigned to the ith set of n. This is of course a number between and m) — 1 , as required. 

Now consider the determination of the output sequence, given the index number of the input. 
Let y denote the index number of the actual result, and M the number of equally likely results (of 
which this result is one). Define i by 2* ^ M < 2 i+1 . Then there are i + 1 bits in the binary representa- 
tion of M: 

M= 2 aj2*, ai = l; aj=0 or 1, allj<i 
j=o 

If y < 2\ we should obtain i bits; if 2* ^ y < 2 i -h2 i i where U is the largest ;( < i) with cy=l, we 
should obtain i\ bits; etc. Now y < 2* precisely when the ith bit of y is (considering the least signifi- 
cant bit as the Oth bit). And conveniently, if the ith bit of y is 0, then the i less significant bits are 
symmetric Bernoulli variables — because the possible values of these bits, when y has the ith bit 
equal to zero, consist of all the binary numbers from to 2 1 — 1. Similarly, 2* ^ y < 2* + 2*i precisely 
when the ith bit of y is 1 and the iist bit is 0; and when this is true, the lower-order i x bits are sym- 
metric Bernoulli variables. This process can be continued through all the bits of y. In summary, the 
procedure is to check in turn all bits of y corresponding to non-zero bits of Af , beginning with the 
most significant; as soon as one of the checked bits is found to be 0, stop, and output the remaining 
bits ofy. 

4. Examples 

The primary results of this section are contained in table 1, which lists the efficiency of each of 
several techniques. Among those evaluated are Qu (?2» and (? 3 , from Simons and Hoeffding; the sim- 
ple procedures outlined above (permutation procedures with j =1) for n = 4, 8, 20, 50, and 1024, 
denoted by S4, Sg, S 2 o, S50, an( ^ ^1024 and one permutation procedure denoted by Pi. Of course, Pi 
needs to be specified: we require n, and the appropriate number of/ values. These are given in table 
2. Note that for x — 3 or 5, ra = 56. The 56 sequences can be (arbitrarily) divided into two groups of 
28, and index numbers from to 27 can be assigned within each group. Then when x = 3 or 5 occurs, 
one binary digit can be output, or 1 according to which group of 28 contains the result; the index 
number can be treated just as if it were a result with x = 2 or 6. This of course decreases the amount 
of storage necessary and the average time between outputs. 
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6. Appendix 

Contained herein are proofs of the two lemmas, and of the lower bound for the efficiency of the 
S — procedures. 

LEMMA 1: Given a positive number k, and e > 0, there exist positive integers i and j such that 
0^k-(i/j)<£/j. 
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PROOF: If k is rational, there exist integers ij such that 

k-(i/j) = 0. 

Otherwise, by Theorem 4.2, page 44 of [3], there exist positive integers i,j such that \jk — i\ < €. If 
jk — i > 0, we are finished. Otherwise — € <jk — i < 0; let 

m=[(i-jk)->] 



i—jk 

Then 

m(jk — i)^ — l >(m+ \)(jk — i) > m(jk — i) — e, 

so that 

^ (mj)k - (mi -l)<e. Q.E.D. 

Lemma 2: Let x 6e £/ie number of successes in n independent and identically distributed 
Bernoulli trials, with probability p of success. Then 

lim {n~ l Elog 2 •(*) } = — plog 2 p - <7k>g2<7- 

n->oo 

PROOF: (All logs in the sequel are to the base 2.) By Stirling's approximation, 

log (*) = rclogn - *log% - (n- x) \og(n - x) - (1/2J logfr - x 2 /^ + Ofn" 1 ^, 
for 1 ^ * ^ n — 1 . Dividing by n , and substituting np + y for % , we have 

/i" 1 log (J ) = log/i - (p + y/rcj logfnp + y) - fg - y/rcj logfng - yj + Ofrr 112 ) 

= - log{(p + y/n)P + vl n (q - yln)*'^"} + Ofa- 1 ' 2 ,) 

= - plogp - qrlogg - (yln)logp/q) 

- (p + yln)log{l + ylnp) - (q - yln)\og{l - y/nq) + 0(n- 112 ). 

Now y is a random variable, with mean zero and variance npq, so that yjn has mean zero and vari- 
ance pq/n. Since log(J)= log(JJ) = 0, we can write 

n~ 1 Elog(») = %'{ - plogp - <?log<7 - (yln)logplq) 

- (p + y/nJlog(l + y/rcp,) - fa - y/nJlog(l - y/ngj 
+ 0(n-^)}P np + y , 

where the prime on the summation indicates that the terms corresponding to x = and to x = n are 
omitted, and where P np + y is the appropriate binomial probability. Thus 

n-*E\og( n x ) = - plogp - q\ogq - E t - Et + 0(n-&) t 
where 

Ei = %'(p+ yln)\og(l + ylnp)P„ p +y 

and 

E 2 = X'(q - yln)\og{l - ylnp)P np +y. 

All that remains is to show that £\ and £" 2 approach as n approaches °°. The proof will be given for 
E\ , that for £2 being essentially the same. 
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Given a small positive quantity e, we want to find a value no such that for any n > m, \Ei(n)\ < e. 
Because < p + (y/n) < 1 , 

\E 1 \<X'\hga + ylnp)\Pn P+ v 

Now y has mean zero and variance npq. Thus by Chebychev's inequality, we have for any a > the 
relation 



Pr(\y/np\ > aVqfn^) ^a~ 2 . 



Therefore for a Vq/np < 1 , 



\Ei\ < max {log (l + a^/qjnp), |log ( 1 — a Vq/np ) | + a~ 2 max |log (l-\-y/np)\ 



= |log (1 — a^/qjnp ) | + a 2 max {|log l//ip|, |log (n — \)jnp\). 

Consider only e<l/2, and n>p~ 2 q~ x . Then np > p~ 1 q~ 1 > p' 1 > (n — l)/np> p~ 1 (l—p 2 q) = 
p~ l —pq > 1, so |log l/np\=lognp > |log (/i — l)/np\. Take a= (e/4) (np/q) 112 . Then 

|log(l-aV^)| = |log (l-€/4)| <e/2, 

and a~ 2 log rcp < 16 (log np)/npe 2 . Now pick zio large enough that (log nop) /nop < e 3 /32. Then for 
n > no, a 2 log np < e/2, so \E\ \ < e. 

Q.E.D. 
(Paper 76B1&2-362) 



60 



